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Abstract 

We present force-clamp data on the collapse of ubiquitin polyproteins 
in response to a quench in the force. These nonequilibrium trajectories are 
analyzed using a general method based on a diffusive assumption of the 
end-to-end length to reconstruct a downhill free energy profile at 5pN and 
an energy plateau at lOpN with a slow diffusion coefficient on the order 
of 100nm 2 s -1 . The shape of the free energy and its linear scaling with the 
protein length give validity to a physical model for the collapse. However, 
the length independent diffusion coefficient suggests that internal rather 
than viscous friction dominates and thermal noise is needed to capture 
the variability in the measured times to collapse. 

By measuring the end-to-end length of proteins and RNA in response to force 
perturbations, single molecule experiments open a window into the complex dy- 
namics of these molecules on their multi-dimensional energy potentials [H, 0, y] . 
For example, a protein is unfolded by the application of a constant pulling force, 
while quenching the force to a low value triggers the hydrophobic collapse of the 
molecule [2, 0| • This dynamical collapse has been modeled as a one-dimensional 
diffusion of the measured end-to-end length on a free energy profile in the case 
of protein monomers [B| and RNA molecules [f|. By contrast, dynamics in de- 
grees of freedom hidden from the experiment were thought to govern the large 
diversity in the end-to-end length of trajectories visited by collapsing polypro- 
teins 0- Whether the experimental distribution of trajectories can be described 
by simple diffusion along the measured reaction coordinate or requires multiple 
dimensions remains an open question that requires novel analysis tools. 

This question is non-trivial because the collapsing traces are out-of-equilibrium 
and standard techniques to reconstruct the free energy profile based on the 
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Jarzynski equality [1] or Crook's fluctuation theorem ji[ are not applicable to 
the force quench experimental protocol. Indeed, these techniques rely on know- 
ing the statistics of the work exerted on the system [l(| [Hill El- In force- 
clamp experiments this work is concentrated in the brief time it takes to quench 
the force (~50ms), which would require a prohibitively large pool of data to 
ensure the statistical accuracy of the free energy estimator based on the Jarzyn- 
ski equality [l4|, EH EH 12 1- A second difficulty is that the free energy alone 



is not sufficient to describe the dynamics of the collapse. If the collapse can 
be described by an overdamped Langevin equation for the end-to-end length of 
the protein la, then a diffusion coefficient must be estimated besides the free 
energy [l8l fl9|. 

Here we introduce an analysis method to reconstruct the free energy pro- 
file [2^] directly from the collapse trajectories of ubiquitin polyproteins, as- 
suming diffusive dynamics. By reconstructing the free energy for polyprotein 
chains with varying numbers of protein domains, we quantify to what extent the 



collapse mechanism is cooperative between the domains [21[ . Moreover, the ob- 
servation that increasing the quench force slows down the collapse process [1, El 
is explained in terms of the shape of the reconstructed free energy landscape, 
which in turn tests the Bell model [22| with no adjustable parameters. We then 
present an extension to the approach that offers the first measurement of an 
effective diffusion coefficient of a collapsing polypeptide and tests its constancy 
along the measured reaction coordinate. Finally, we propose a microscopic ori- 
gin for the observed collapse in terms of the worm like chain and 'expanding 
sausage' models @. 

We use Atomic Force Microscopy (AFM) in the force-clamp mode to fol- 
low the unfolding and refolding trajectories of ubiquitin polyproteins under a 
constant stretching force, as shown in the example in Fig. [TJ Exposing a me- 
chanically stable protein to a high pulling force of HOpN leads to the stepwise 
unfolding and extension of each of the three protein domains in the polypeptide 
chain. Subsequently, quenching the force to a lower value of lOpN triggers the 
collapse of the whole protein from a fully extended state back to a collapsed 
state with the same end-to-end length as the folded protein. Previous experi- 
ments have shown that the final state of the collapse process does not lead to 
a mechanically stable folded protein, but a random compact globule that forms 
native contacts over time |23[. A second pull on the same protein at HOpN 
leads to a second unfolding, as shown in the trajectory. Here we analyze only 
those trajectories that exhibit a minimum of three steps of ^20nm in the initial 
staircase as a signature of the extension of individual ubiquitin domains upon 
unfolding, as well as a second staircase to signify refolding. The question is then 
to understand the mechanism of the collapse dynamics from many recordings 
i n tot ~ 100) of these trajectories. 

Theoretically, if we denote by x the end-to-end length, the overdamped 
Langevin equation reads 

x = -f3DG'(x) + V2D n(t) (I) 
where ft = l/(fcsT), rj(t) is a white-noise term accounting for thermal effects, 
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G{x) is the equilibrium free energy profile and D is the diffusion coefficient 
which we assume to be constant (this assumption is validated below). Both 
G(x) and D, or the friction coefficient 7 since D = ksT/j, can be estimated 
from the collapsing traces using the techniques introduced in [2(| ■ 

Let us consider the free energy first. The procedure to calculate G(x) from 
the collapsing traces is to cut out pieces of trajectories from the moment the 
force is quenched at the unfolded length, x u , until the moment they first reach 
the folded length at low force, Xf. This allows us to estimate via binning a 
noncquilibrium stationary probability density p(x) of many collapsing trajecto- 
ries, and relate G{x) to it for x € [xf,x u ] as follows: 

G(x) = -k B T\np{x) - k B Tp'{x f ) f dx'/p{x') (2) 

J X 

where p'ixi) denotes the derivative of p(x) estimated at Xf. Note that this for- 
mula is different from the standard G(x) = — ksTlu p e {x) 1 where p e (x) is the 
equilibrium probability density function. The nonequilibrium p{x) requires an 
additional term besides — ksT\u p(x) in ([2]) to relate it to G(x). This extra 
term corrects for the fact that p{x) is biased towards values of x that are closer 
to x M , where the trajectories are initiated by the protocol. For a detailed deriva- 
tion of © we refer the reader to 201. where this formula is also compared to 



Bayesian inference methods [18|, [2J, |25j . Using Eq. © is advantageous because 



the chronological order in which the data is acquired does not play a role in 
the binning procedure, which implies that the time resolution of the instrument 
(~ 5ms) has no impact on the resulting landscape. 

Next we apply ([2|) to analyze force-clamp trajectories, such as the one shown 
in Figure [TJ Since the length of the polyprotein chain and the polypeptide linker 
to the surface vary from one experiment to the next, we compare all trajectories 
in terms of the total length of the collapse L tot — x u — Xf. We find that L tot 
clusters in increments of a monomer ubiquitin length of ~ 20nm with a standard 
deviation of ~ 6nm. We therefore group the clusters of similar collapse lengths 
and estimate the number of domains in the polyprotein chain as Nd — L to t /20nm 
to the nearest integer. Setting the lowest value of L to t within a group of a given 
Nd to be x u at time zero and Xf to 4nm [26| as the folded length of the protein 
from the protein data bank, leads to the alignments of trajectories shown in 
Figs.[2l\. and 03 for the fOpN and 5pN force quench, respectively, in the group 
of Nd = 3. Analyzing trajectories in groups segregated by Nd, we measure 
the non-equilibrium distribution p{x) of the end-to-end length for each Nd, as 
shown in Fig. [3l We find that they approximately scale linearly with Nd at 
both forces, as shown in the insets. At a quench force of lOpN, the extended 
polypeptides often plateau at ~ 70% of the contour length before their final 
collapse. Lowering the quench force to 5pN reveals faster collapse trajectories 
that visit all end-to-end lengths with a similar probability. 

Using the observed distributions, we then obtain G]\[ d (x), the free energy of 
a polyprotein of Nd units, and collapse these different profiles onto one another 
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using the rescaling 

G(x + x f ) = G Nd=1 {x + x f ) = -^-G Nd {N d (x + x f )) (3) 

This cooperativity between the domains is inconsistent with previously proposed 
models for the stochastic refolding of individual domains [27( or the aggregation 
of the unfolded domains jjjj ]. Instead, our result in (O suggests a global collapse 
of the polypeptide chain due to the attraction between hydrophobic residues that 
do not directly lead to folding [23, [29|, |30( • 

The shape of the resulting free energy profile G(x) per ubiquitin monomer 
in Fig. 4A at lOpN is interesting because of the absence of a barrier: the exper- 
imental collapse corresponds to a diffusive slide on a plateau in the free energy 
that accelerates as the end-to-end length reaches a value ~ 5nm away from Xf. 
Lowering the force to 5pN eliminates the plateau landscape and promotes a 
downhill collapse that is limited by friction alone, which is roughly consistent 
with the prediction of the tilt by the Bell model, also shown in Fig. 4A. Sim- 
ilar features of ubiquitin monomer trajectories under a quench force of lOpN 
were interpreted in terms of a physical model that predicts a free energy profile 
with a barrier of 2.5ksT Q. Since tilting the profiles in Fig. [4]^ by the Bell 



model [22} to just 13pN leads to a barrier to collapse of the same height, this 
small difference in the quench force could explain the observed change in the 
profile. However, the functional form of the landscape proposed in Q does not 
fit the free energy profiles accurately due to its propensity to form barriers over 
a wide range of quench forces. 

A better fit is achieved using the physical model proposed for the collapse 
of RNA molecules in Q , which is based on the sum of the entropic worm like 
chain model, the work done on the protein and the enthalpic 'expanding sausage' 



model for polypeptide collapse }3l|: 

2kBT 



G(x) = —^-J-KSl{x-Xf)-F{x-x f ) 



(4) 



Here F is the applied force, L c and lp are the contour and persistence lengths 
of the extended protein, respectively, f2 is the volume of the sausage, and £ 
is the size of a globule inside the sausage [31]. The adjustable parameters in 
Eq. (0J are lp, L c , and the ratio v^/^ 2 - Fits to G{x) in Fig.[Hgive L c = 26nm, 
predicted by the size of a ubiquitin monomer (76 residues x 0.36 = 27.4nm) [26j], 
lp = 0.82nm at 5pN and 1.45nm at lOpN, in agreement with chain stiffening 
along the backbone due to intramolecular interactions Q • To obtain the values 
of and £ from their ratio given by the fit, we assume that the size of the 
individual monomers in the sausage is lp. This implies that the number TV of 
these monomers must be N — L c /lp. Following de Gennes' argument, we then 
set £ = Ip^fg and ft — L c tt^ 2 — L c irlpg, where g is the number of monomers 
inside each globule and becomes the fit parameter that replaces the ratio. Fits 
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to G(x) in Fig. 2] thus yield £ = 2.6nm at 5pN and 2.7nm at lOpN, in rough 
agreement with the value £ = 2nm estimated for the hydrophobic collapse [32J, 
and fl = 203.62nm 3 at 5pN and 373.90nm 3 at lOpN. Note that the above 
argument does not affect the quality of the fits, simply it gives an interpretation 
of the parameters in Eq @ that indicates that the microscopic packing of blobs 
inside the initial sausage is different for the two quench forces. Note also that 
the functional form of this free energy is consistent with the scaling with Nd 
in Eq. @ since the volume of a polyprotein with Nd domains is Nd^l and its 
contour length NdL c while all the other parameters in Eq. (Q| are unaffected 
by Nd- Altogether, these results give a quantitative validation of the physical 
model underlying the collapse. 

The collapsing traces can also be used to calculate the diffusion coeffi- 
cient D(x) on the reconstructed landscape and thereby verify our assumption 
that it is constant, D(x) ~ D. The idea is to replace Xf by any x e [xf,x u ] in 
the procedure, i.e. cut the traces from x u till the first moment they reach x and 
recalculate their non-equilibrium probability density p. The probability flux of 
these traces through the end-point x can be expressed in two ways: it is given 
by D(x)p'(x), and it can also be estimated directly as 1/t c (x), where t c (x) is 
the average time it takes them to collapse from x u to x. Equating these two 
expressions and solving for D{x) gives 

Dips) = l/(r c (x)p'(x)) (5) 

This estimator for D(x) is new and it has the advantage over the standard one 
using quadratic variation of the trajectory [l8j that it is insensitive to the time 
resolution of the instrument. Because of the small number of traces per Nd 
per force (~ 15), the estimate for D{x) is accurate over the plateau regime in 
the end-to-end length in the data set at lOpN and not in the drift dominated 
parts of the landscape. The results obtained for polyproteins with different Nd 
in Fig. 4B are in good agreement with each other, within the experimental 
error, and show that the diffusion coefficient is roughly constant as a function 
of x, consistent with the assumption made in Eq. (fl}. This is a surprising 
result because the 'expanding sausage' model predicts a 1/x scaling of D(x) 
due to an increase in the viscous friction as the molecule collapses to a blob 
of a growing radius. By contrast, here the protein dynamics is governed by 
internal rather than solvent friction, which agrees with recent single molecule 
experiments that show an independent friction with the end-to-end length of a 
folding protein j33|,[34j]. Notice also that the average value of 170nm 2 /s is orders 
of magnitude smaller than the typical vibrational modes of a protein 35] . This 
indicates that the projection of all the degrees of freedom of the molecule onto a 
single reaction coordinate manifests itself as a very slow diffusion. It is likely that 
many local barriers in other degrees of freedom (associated with the formation 
of e.g. loops or helices at the same end-to-end length) can be mimicked by an 
effective diffusion constant. 

To verify our results, we generate artificial traces using Eq. (1) with the 
estimated G(x) and D and show that they are in excellent agreement with 
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experimental traces in Fig. 2. In addition, the fact that traces generated using 
D derived at lOpN reproduce the spread of times to collapse and the noise 
fluctuations in the experimental traces at 5pN suggests that D does not change 
with the quench force. We estimate that ~ 70% of experimental trajectories are 
consistent with the 1-D diffusive model, while the outliers do not agree with the 
synthetic distribution of collapse times. Such trajectories have been observed 
previously 0, HH and they highlight the importance of other degrees of freedom. 
Nevertheless, the simulated and the experimental average times to collapse t c 
agree very well at both quench forces and for all Nd, as shown in the inset in 
Fig. 0)3. By contrast, a linear scaling with Nd of a barrier-limited G(x) [5] would 
lead to a much steeper dependence of 7>with Nd, which is inconsistent with our 
and other published polyprotein data 

This general non-equilibrium method to analyse single molecule trajectories 
has allowed us to reconstruct free energy profiles, assess the dynamics along 
the measured reaction coordinate and thus postulate a physical model for the 
collapse of ubiquitin proteins. This technique paves the path for a mechanistic 
approach to many complex problems, such as protein folding. 
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Figure 1: A typical force-clamp trajectory of the unfolding and refolding of a 
polyubiquitin chain with Nd — 3 domains. A second pull to HOpN reveals a 
staircase as a signature that the protein domains refold. 
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Figure 2: Collapsing trajectories are grouped by their total length (Nd = 3) 
and aligned at the time of the force quench to lOpN in (A) and 5pN in (B). The 
experimental trajectories are compared with those generated by simulations of 
diffusive dynamics on the reconstructed free energy profiles. 
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Figure 3: The nonequilibrium distribution p(x) for each Nd collected at a force 
quench of lOpN in (A) and 5pN in (B). The linear rescaling by Nd is shown in 
the inset, which indicates a cooperative mechanism for the collapse. 
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Figure 4: (A) Experimental free energy profiles as a function of the end-to- 
end length, rescaled by Nd- (B) Diffusion coefficients derived from Eq. (5) 
at lOpN (solid lines) compare well with those estimated from the free energy 
reconstruction (dashed lines). The inset shows r c dependence on Nd (squares), 
consistent with simulated data (circles) 
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